<!doctype html>
<html>
  <head>
    <!-- MathJax -->
    <script type="text/javascript"
      src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
    </script>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="chrome=1">
    <title>
      Caffe | Convolution Layer
    </title>

    <link rel="icon" type="image/png" href="/images/caffeine-icon.png">

    <link rel="stylesheet" href="/stylesheets/reset.css">
    <link rel="stylesheet" href="/stylesheets/styles.css">
    <link rel="stylesheet" href="/stylesheets/pygment_trac.css">

    <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
    <!--[if lt IE 9]>
    <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
    <![endif]-->
  </head>
  <body>
  <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-46255508-1', 'daggerfs.com');
    ga('send', 'pageview');
  </script>
    <div class="wrapper">
      <header>
        <h1 class="header"><a href="/">Caffe</a></h1>
        <p class="header">
          Deep learning framework by <a class="header name" href="http://bair.berkeley.edu/">BAIR</a>
        </p>
        <p class="header">
          Created by
          <br>
          <a class="header name" href="http://daggerfs.com/">Yangqing Jia</a>
          <br>
          Lead Developer
          <br>
          <a class="header name" href="http://imaginarynumber.net/">Evan Shelhamer</a>
        <ul>
          <li>
            <a class="buttons github" href="https://github.com/BVLC/caffe">View On GitHub</a>
          </li>
        </ul>
      </header>
      <section>

      <h1 id="convolution-layer">Convolution Layer</h1>

<ul>
  <li>Layer type: <code class="highlighter-rouge">Convolution</code></li>
  <li><a href="http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ConvolutionLayer.html">Doxygen Documentation</a></li>
  <li>Header: <a href="https://github.com/BVLC/caffe/blob/master/include/caffe/layers/conv_layer.hpp"><code class="highlighter-rouge">./include/caffe/layers/conv_layer.hpp</code></a></li>
  <li>CPU implementation: <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cpp"><code class="highlighter-rouge">./src/caffe/layers/conv_layer.cpp</code></a></li>
  <li>CUDA GPU implementation: <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu"><code class="highlighter-rouge">./src/caffe/layers/conv_layer.cu</code></a></li>
  <li>Input
    <ul>
      <li><code class="highlighter-rouge">n * c_i * h_i * w_i</code></li>
    </ul>
  </li>
  <li>Output
    <ul>
      <li><code class="highlighter-rouge">n * c_o * h_o * w_o</code>, where <code class="highlighter-rouge">h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1</code> and <code class="highlighter-rouge">w_o</code> likewise.</li>
    </ul>
  </li>
</ul>

<p>The <code class="highlighter-rouge">Convolution</code> layer convolves the input image with a set of learnable filters, each producing one feature map in the output image.</p>

<h2 id="sample">Sample</h2>

<p>Sample (as seen in <a href="https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/train_val.prototxt"><code class="highlighter-rouge">./models/bvlc_reference_caffenet/train_val.prototxt</code></a>):</p>

<div class="highlighter-rouge"><pre class="highlight"><code>  layer {
    name: "conv1"
    type: "Convolution"
    bottom: "data"
    top: "conv1"
    # learning rate and decay multipliers for the filters
    param { lr_mult: 1 decay_mult: 1 }
    # learning rate and decay multipliers for the biases
    param { lr_mult: 2 decay_mult: 0 }
    convolution_param {
      num_output: 96     # learn 96 filters
      kernel_size: 11    # each filter is 11x11
      stride: 4          # step 4 pixels between each filter application
      weight_filler {
        type: "gaussian" # initialize the filters from a Gaussian
        std: 0.01        # distribution with stdev 0.01 (default mean: 0)
      }
      bias_filler {
        type: "constant" # initialize the biases to zero (0)
        value: 0
      }
    }
  }
</code></pre>
</div>

<h2 id="parameters">Parameters</h2>
<ul>
  <li>Parameters (<code class="highlighter-rouge">ConvolutionParameter convolution_param</code>)
    <ul>
      <li>Required
        <ul>
          <li><code class="highlighter-rouge">num_output</code> (<code class="highlighter-rouge">c_o</code>): the number of filters</li>
          <li><code class="highlighter-rouge">kernel_size</code> (or <code class="highlighter-rouge">kernel_h</code> and <code class="highlighter-rouge">kernel_w</code>): specifies height and width of each filter</li>
        </ul>
      </li>
      <li>Strongly Recommended
        <ul>
          <li><code class="highlighter-rouge">weight_filler</code> [default <code class="highlighter-rouge">type: 'constant' value: 0</code>]</li>
        </ul>
      </li>
      <li>Optional
        <ul>
          <li><code class="highlighter-rouge">bias_term</code> [default <code class="highlighter-rouge">true</code>]: specifies whether to learn and apply a set of additive biases to the filter outputs</li>
          <li><code class="highlighter-rouge">pad</code> (or <code class="highlighter-rouge">pad_h</code> and <code class="highlighter-rouge">pad_w</code>) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input</li>
          <li><code class="highlighter-rouge">stride</code> (or <code class="highlighter-rouge">stride_h</code> and <code class="highlighter-rouge">stride_w</code>) [default 1]: specifies the intervals at which to apply the filters to the input</li>
          <li><code class="highlighter-rouge">group</code> (g) [default 1]: If g &gt; 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the <script type="math/tex">i</script>th output group channels will be only connected to the <script type="math/tex">i</script>th input group channels.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>From <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto"><code class="highlighter-rouge">./src/caffe/proto/caffe.proto</code></a>):</li>
</ul>

<figure class="highlight"><pre><code class="language-protobuf" data-lang="protobuf"><span class="kd">message</span> <span class="nc">ConvolutionParameter</span> <span class="p">{</span>
  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">num_output</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// The number of outputs for the layer
</span>  <span class="k">optional</span> <span class="kt">bool</span> <span class="na">bias_term</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="kc">true</span><span class="p">];</span> <span class="c1">// whether to have bias terms
</span>
  <span class="c1">// Pad, kernel size, and stride are all given as a single value for equal
</span>  <span class="c1">// dimensions in all spatial dimensions, or once per spatial dimension.
</span>  <span class="k">repeated</span> <span class="kt">uint32</span> <span class="na">pad</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span> <span class="c1">// The padding size; defaults to 0
</span>  <span class="k">repeated</span> <span class="kt">uint32</span> <span class="na">kernel_size</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span> <span class="c1">// The kernel size
</span>  <span class="k">repeated</span> <span class="kt">uint32</span> <span class="na">stride</span> <span class="o">=</span> <span class="mi">6</span><span class="p">;</span> <span class="c1">// The stride; defaults to 1
</span>  <span class="c1">// Factor used to dilate the kernel, (implicitly) zero-filling the resulting
</span>  <span class="c1">// holes. (Kernel dilation is sometimes referred to by its use in the
</span>  <span class="c1">// algorithme à trous from Holschneider et al. 1987.)
</span>  <span class="k">repeated</span> <span class="kt">uint32</span> <span class="na">dilation</span> <span class="o">=</span> <span class="mi">18</span><span class="p">;</span> <span class="c1">// The dilation; defaults to 1
</span>
  <span class="c1">// For 2D convolution only, the *_h and *_w versions may also be used to
</span>  <span class="c1">// specify both spatial dimensions.
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">pad_h</span> <span class="o">=</span> <span class="mi">9</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="mi">0</span><span class="p">];</span> <span class="c1">// The padding height (2D only)
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">pad_w</span> <span class="o">=</span> <span class="mi">10</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="mi">0</span><span class="p">];</span> <span class="c1">// The padding width (2D only)
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">kernel_h</span> <span class="o">=</span> <span class="mi">11</span><span class="p">;</span> <span class="c1">// The kernel height (2D only)
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">kernel_w</span> <span class="o">=</span> <span class="mi">12</span><span class="p">;</span> <span class="c1">// The kernel width (2D only)
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">stride_h</span> <span class="o">=</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">// The stride height (2D only)
</span>  <span class="k">optional</span> <span class="kt">uint32</span> <span class="na">stride_w</span> <span class="o">=</span> <span class="mi">14</span><span class="p">;</span> <span class="c1">// The stride width (2D only)
</span>
  <span class="k">optional</span> <span class="kt">uint32</span> <span class="kd">group</span> <span class="o">=</span> <span class="mi">5</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="mi">1</span><span class="p">];</span> <span class="c1">// The group size for group conv
</span>
  <span class="k">optional</span> <span class="n">FillerParameter</span> <span class="na">weight_filler</span> <span class="o">=</span> <span class="mi">7</span><span class="p">;</span> <span class="c1">// The filler for the weight
</span>  <span class="k">optional</span> <span class="n">FillerParameter</span> <span class="na">bias_filler</span> <span class="o">=</span> <span class="mi">8</span><span class="p">;</span> <span class="c1">// The filler for the bias
</span>  <span class="kd">enum</span> <span class="n">Engine</span> <span class="p">{</span>
    <span class="na">DEFAULT</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="na">CAFFE</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="na">CUDNN</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
  <span class="p">}</span>
  <span class="k">optional</span> <span class="n">Engine</span> <span class="na">engine</span> <span class="o">=</span> <span class="mi">15</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="n">DEFAULT</span><span class="p">];</span>

  <span class="c1">// The axis to interpret as "channels" when performing convolution.
</span>  <span class="c1">// Preceding dimensions are treated as independent inputs;
</span>  <span class="c1">// succeeding dimensions are treated as "spatial".
</span>  <span class="c1">// With (N, C, H, W) inputs, and axis == 1 (the default), we perform
</span>  <span class="c1">// N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
</span>  <span class="c1">// groups g&gt;1) filters across the spatial axes (H, W) of the input.
</span>  <span class="c1">// With (N, C, D, H, W) inputs, and axis == 1, we perform
</span>  <span class="c1">// N independent 3D convolutions, sliding (C/g)-channels
</span>  <span class="c1">// filters across the spatial axes (D, H, W) of the input.
</span>  <span class="k">optional</span> <span class="kt">int32</span> <span class="na">axis</span> <span class="o">=</span> <span class="mi">16</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="mi">1</span><span class="p">];</span>

  <span class="c1">// Whether to force use of the general ND convolution, even if a specific
</span>  <span class="c1">// implementation for blobs of the appropriate number of spatial dimensions
</span>  <span class="c1">// is available. (Currently, there is only a 2D-specific convolution
</span>  <span class="c1">// implementation; for input blobs with num_axes != 2, this option is
</span>  <span class="c1">// ignored and the ND implementation will be used.)
</span>  <span class="k">optional</span> <span class="kt">bool</span> <span class="na">force_nd_im2col</span> <span class="o">=</span> <span class="mi">17</span> <span class="p">[</span><span class="k">default</span> <span class="o">=</span> <span class="kc">false</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>



      </section>
  </div>
  </body>
</html>
